Identification of Chinese Personal Names in Unrestricted Texts

نویسندگان

  • Lawrence Y. L. Cheung
  • Benjamin Ka-Yin T'sou
  • Maosong Sun
چکیده

Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name identification algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Classification of Proper Nouns in Chinese Texts

Various strategies are proposed to identify and classify three types of proper nouns in Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve Chinese personal names. Character, Syllable and Frequency Conditions are presented to treat transliterated personal names, To deal with organization names, keywords, prefix, word association and parts-of-speech are app...

متن کامل

Evaluating Chinese-English Translation Systems for Personal Name Coverage

This paper discusses the challenges which Chinese-English machine translation (MT) systems face in translating personal names. We show that the translation of names between Chinese and English is complicated by different factors, including orthographic, phonetic, geographic and social ones. Four existing systems were tested for their capability in translating personal names from Chinese to Engl...

متن کامل

A Literary Anthroponomastics of Three Selected African Novels: A Cross Cultural Perspective

Names as markers of identity are a source of a wide variety of information. This paper explores the names of characters to show the sociocultural factors which influence the choice of names and the effects that the names of these characters have on the roles they play. Using a variety of personal names from Ayi Kwei Armah’s Fragments, Buchi Emecheta’s The Joys of Motherhood, a...

متن کامل

Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition

In this paper, we analyze Chinese personal names to model their statistical phonotactic characteristics for continuous Chinese speech recognition. The analysis showed languagespecific characteristics of Chinese personal names and strongly suggested the advantage of character-unit oriented modeling. A hierarchical language model was composed by reflecting statistical phonotactic characteristics ...

متن کامل

Knowledge Extraction For Identification Of Chinese Organization Names

In this paper, a knowledge extraction process was proposed to extract the knowledge for identifying Chinese organization names. The knowledge extraction process utilizes the structure property, statistical property as well as partial linguistic knowledge of the organization names to extract new organizations from domain texts. The knowledge extraction processes were experimented on large amount...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002